Probabilistic Feature Selection and Classification Vector Machine
نویسندگان
چکیده
Sparse Bayesian learning is one of the state-ofthe-art machine learning algorithms, which is able to make stable and reliable probabilistic predictions. However, some of these algorithms, e.g. probabilistic classification vector machine (PCVM) and relevant vector machine (RVM), are not capable of eliminating irrelevant and redundant features which could lead to performance degradation. To tackle this problem, in this paper, we propose a sparse Bayesian classifier which simultaneously selects the relevant samples and features. We name this classifier a probabilistic feature selection and classification vector machine (PFCVM), in which truncated Gaussian distributions are employed as both sample and feature priors. In order to derive the analytical solution for the proposed algorithm, we use Laplace approximation to calculate approximate posteriors and marginal likelihoods. Finally, we obtain the optimized parameters and hyperparameters by the type-II maximum likelihood method. The experiments on synthetic data set, benchmark data sets and high dimensional data sets validate the performance of PFCVM under two criteria: accuracy of classification and efficacy of selected features. Finally, we analyze the generalization performance of PFCVM and derive a generalization error bound for PFCVM. Then by tightening the bound, we demonstrate the significance of the sparseness for the model.
منابع مشابه
Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine
Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملModeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification
Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...
متن کاملSupport Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran
Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...
متن کاملFast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملApplying Combined Approach of Sequential Floating Forward Selection and Support Vector Machine to Predict Financial Distress of Listed Companies in Tehran Stock Exchange Market
Objective: Nowadays, financial distress prediction is one of the most important research issues in the field of risk management that has always been interesting to banks, companies, corporations, managers and investors. The main objective of this study is to develop a high performance predictive model and to compare the results with other commonly used models in financial distress prediction M...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1609.05486 شماره
صفحات -
تاریخ انتشار 2016